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Abstract 

, The capacity of noncoherent fading relay channels is studied, where the communication 

j^Q' between the transmitter and the receiver is supported by a relay, where the links between the 

terminals are fading channels, and where all terminals are aware of the fading statistics, but 
not of their realizations. It is shown that if the link between the transmitter and the receiver 
supports higher communication rates than the link between the relay and the receiver, then 
at high SNR it is best to turn the relay off. It is further shown that if the link between the 
transmitter and the relay supports higher communication rates than the link between the relay 
and the receiver, then at high SNR one can achieve communication rates that are within one 
bit of the capacity of the multiple-input single-output fading channel that results when the 
transmitter and the relay can cooperate. 
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1 Introduction 



j^. ■ We study the capacity of fading relay channels. A relay channel consists of a transmitter, a receiver, 

and a relay, which supports the transmitter in communicating with the receiver. The word "fading" 
refers to the variation in the strength of the links between these terminals. Coherent fading relay 
channels were studied, e.g., in [1], [2]. For such channels, the fading coefficients are available at the 
OO ' corresponding receiving terminals. 

The assumption that the fading coefficients are available at the receiving terminals is commonly 
justified by saying that these coefficients vary slowly over time and can therefore be estimated by 
transmitting training sequences. However, this assumption yields overly-optimistic results, since it 
is prima facie not clear whether the fading coefficients can be estimated perfectly, and since the 
transmission of training sequences reduces the achievable communication rates (training sequences 
do not contain information). For instance, in the point-to-point case, where a transmitter commu- 
nicates with a receiver (without the aid of a relay), the loss in not knowing the fading coefficient 
at the receiver beforehand can be substantial. Indeed, if the fading is regular in the sense that 
the present fading coefficient cannot be predicted perfectly from its infinite past, then, at high 
signal-to- noise ratio (SNR), the capacity grows double- logarithmically with the SNR [3], which is in 
stark contrast to the logarithmic growth in the coherent case [4]. If the fading is nonregular in the 
sense that the present fading coefficient can be predicted perfectly from its past, then the capacity 
can grow logarithmically with the SNR, but the pre-log, defined as the limiting ratio of capacity to 
log SNR as SNR tends to infinity, depends on the fading's autocovariance function and is typically 
strictly smaller than one [5]. 

In this paper, we study the capacity of noncoherent fading relay channels with regular fading. 
For such channels the terminals are aware only of the laws of the fading coefficients, but not of their 
realizations. We derive two basic results. First, we show that if the link between the transmitter 
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Figure 1: The relay channel. 

and the receiver supports higher communication rates than the link between the relay and the 
receiver, then at high SNR it is optimal to turn the relay off. Second, we show that if the link 
between the transmitter and the relay supports higher communication rates than the link between 
the relay and the receiver, then at high SNR one can achieve communication rates that are within 
one bit of the capacity of the multiple-input single-output (MISO) fading channel that results when 
the transmitter and the relay can cooperate. Thus, at high SNR the rate penalty for establishing 
cooperation between the transmitter and the relay is not greater than one bit. 

Note that we model the fading coefficients as stationary and ergodic stochastic processes whose 
autocovariance functions determine the fading's time-variation. This excludes the so called block- 
fading model introduced by Marzetta and Hochwald [6]. (The block-fading model is not stationary.) 
It turns out that, in the point-to-point case, the block- fading model and the stationary and ergodic 
fading model yield completely different capacity behaviors at high SNR, cf. [7] and [3, 5]. We expect 
that this is also the case for fading relay channels. 

This paper is organized as follows. Section 2 describes the mathematical channel model. Sec- 
tion 3 introduces channel capacity and defines the fading number. Section 4 presents the main 
results. Section 5 presents nonasymptotic bounds on the capacity of the relay channel. Section 6 
contains the proof of the upper bound (Theorem 1), and Section 7 contains the proof of the lower 
bound (Theorem 3). Sections 8 and 9 conclude the paper with a discussion and summary of the 
obtained results. 

2 Channel Model 

The relay channel, depicted in Figure 1, consists of three terminals: the transmitter, the receiver, 
and the relay. The message M to be transmitted over the relay channel is assumed to be uniformly 
distributed over the set M. = {1, . . . , |.M|}, where \A4\ is a positive integer. The transmitter maps 
M to the length-n sequence X" — X\, . . . ,X n , where n is referred to as the blocklength. Thus, 
X" = 0„(M) for some mapping (fi n : M — > C™ (where C denotes the set of complex numbers). 
At each time instant k £ Z (where Z denotes the set of integers), the relay observes Y r ,k € C 
and emits the symbol X r ^ G C, which is a function of the previously received symbols Y r -f 1 , i.e., 
X r .k = ip n ^k(Y^ 1 ) , k — 1, . . . ,n for some mapping tp ni k '■ C fe_1 — > C. The receiver observes the 

channel output symbols Y™ from which it guesses M. The receiver's guess is denoted by M, i.e., 
M = ip n (Y{ 1 ) for some mapping ip: C n -> M. 

The time-fc channel outputs Y r ,k and Yfc corresponding to the channel inputs Xk and 
given by 

Y r .k = H\.k x k + Z r> k, k E Z (1) 

Y k = H 2 .kXk + H 3 . k Xr t k + Z k , k e Z. (2) 

Here {i?i,fe, k G Z}, {i?2,fe, k € Z}, {H 3t k, k € Z}, {Z ri k, k e Z}, and {Zk, k e Z} are stationary 
and ergodic stochastic processes that take on values in C and are independent of each other. Fur- 
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thermore, {Hi k , k £ Z} and {Z r ^ k , k e Z} arc of a joint law that does not depend on {x k , k € Z}, 
and {H 2 ,k, k € Z}, {i?3,fe, A; G Z}, and {Zfc, fc € Z} are of a joint law that does not depend on 
{(xk,x r ,k), k e Z}. 

The additive noise terms {Zfc, fc e Z} and {Z Ttk , fc € Z} are both sequences of independent and 
identically distributed (i.i.d.), zero-mean, circularly-symmetric, complex Gaussian random variables 
of variance a 2 . The multiplicative noise terms ("fading") {Hi ik , k e Z}, {H 2 , k , k G Z}, and 
{i? 3> fc, fceZ} are zero- mean, unit- variance, stationary and ergodic, circularly-symmetric, complex 
Gaussian processes with the respective spectral distribution functions F\(-), F 2 (-), and F 3 (-). Thus, 
Fg(-), I = 1, 2, 3 are bounded and nondecreasing functions on [—1/2, 1/2] satisfying 

E[H e , k+m H* k ] = e i2 ™ A dF,(A), £-1,2,3 (3) 

J-l/2 

where i = V^T. Wc assume a noncoherent channel model where the transmitter, receiver, and relay 
are not aware of the realization of the fading processes {Ht. kl k e Z}, £ = 1,2,3 but only of their 
joint law. We further assume that the fading processes {Hg. kl k e Z}, I = 1,2,3 are regular in the 
sense that they satisfy 

r-i/2 

logF;(A)dA > -oo, £=1,2,3 (4) 

-1/2 

where F^(-) denotes the derivative of Ft(-). (Note that, since Ft(-) is monotonic, it is almost 
everywhere diffcrentiable. At the discontinuity points of Fg(-) the derivative F' e {-) is undefined.) 
This implies that the mean-square error in predicting Hi o from Hi -i,Hg _2, . . ., which is given by 
[8] 

,1/2 



e 2 



exp^J ]ogFi{\)d\j (5) 

is strictly positive. We also have ej < 1, i = 1,2,3 since we take {Hg tk , k G Z} to have unit 
variance. Roughly speaking, we can thus say that a regular process cannot be predicted perfectly 
from its past. 

We assume that the channel inputs X k and X r ^ k satisfy a peak-power constraint, i.e., we have 
with probability one 

|A fe | 2 <A 2 , fceZ (6) 

and 

\x r , k \ 2 < A 2 ., k e z (7) 
for some positive real A and A r . We assume that 

A r = pA (8) 

for some p > (independent of A), and we define the SNR as 

A 2 

SNR 4 (9) 
a 

Note that the results presented in this paper continue to hold if the peak-power constraints are 
replaced by average-power constraints. 



3 Channel Capacity and Fading Number 

A rate i?(SNR) (in nats per channel use) is said to be achievable, if for every S > there exist 
mappings <p n , (<fi n ,i, ■ ■ ■ , <Pn,n), and ip n satisfying (6) and (7) such that 

> fl(SNR) - 5 
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and such that the error probability Pr(M ^ M) tends to zero as n tends to infinity. (Here log(-) 
denotes the natural logarithm function.) The capacity C(SNR) is defined as the supremum of all 
achievable rates. 1 

We will focus on the asymptotic behavior of capacity at high SNR. In the point-to-point case, 
Lapidoth and Moser demonstrated that for regular fading, the capacity satisfies [3, Th. 4.2] 

Urn {C(SNR) - log log SNR} < oo (10) 

where lim denotes the limit superior. They defined the fading number \ as [3, Def. 4.6] 

X = g Jm |C(SNR) -loglogSNR} (11) 

and computed its value for different fading channels. For instance, when the fading is a zero-mean, 
unit-variance, circularly-symmetric, complex Gaussian process of spectral distribution function F(-), 
the fading number is [3, Cor. 4.42] 

X = -l-7 + log^ (12) 

where 7 « 0.577 denotes Euler's constant, and where e 2 is defined in (5). 

It follows from (11) that, at high SNR, the capacity can be approximated as 

C(SNR) w log log SNR + x- (13) 

Thus, at high SNR, communication is very power-inefficient, since one should expect to square the 
SNR for every additional bit per channel use. Since loglogSNR grows very slowly with the SNR, 
it follows that, over a large range of SNR, loglogSNR does not change much. For example, for 
SNR e [30dB, 80dB], it is between 2.1 and 3, and the capacity can be approximately bounded by 

2.1 + x < C(SNR) < 3 + x, SNR e [30dB,80dB]. (14) 

This gives rise to the rule of thumb that a system operating at rates considerably larger than 
2 + x is probably operating in the high-SNR regime and is thus very power-inefficient [9], see also 
[10, 5]. The fading number can therefore be viewed as an indication of the maximal rate at which 
power- efficient communication is feasible. 

While the fading number indicates at what rates communication is power- inefficient, it should 
be noted that it is difficult to determine the SNR at which this happens. Indeed, the fading number 
of zero-mean Gaussian fading channels depends on the spectral distribution function F(-) only via 
the mean-square error e 2 in predicting the present fading from its past, whereas the SNR at which 
(13) becomes accurate depends not only on e 2 , but also on F(-) itself [10, 5, ll]. 2 

In the following section, we present results on the fading number of fading relay channels. They 
indicate at what rates a relay channel operates in the power-inefficient regime. 

4 Main Results 

Following Lapidoth and Moser [3], we define the fading number of the relay channel as 

X = SN 5n |C(SNR)- loglogSNR}. (15) 

An upper bound follows from the so called max-flow min-cut upper bound [12, Th. 14.7.1]. 
Theorem 1 (Upper bound). Consider the above fading relay channel. Then 

X < mini -27 + log 4 + log ^, maxj -1 - 7 + log \, -1 - 7 + log \ 1 \ (16) 



1 Note that (6) and (7) are fully characterized by SNR and p. Furthermore, we will see later that, in the limit 
as the SNR tends to infinity, the asymptotic behavior of C(SNR) does not depend on p. Thus, for a fixed p, the 
high-SNR asymptotic behavior of C(SNR) depends only on the SNR. 

2 The SNR at which (13) becomes accurate depends on the so-called noisy prediction error [5, Eq. (11)]. 
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which becomes 

X<-l- 7 + log4 for el < 4 (17) 

e 2 

Here 7 « 0.577 denotes Euler's constant, and ej, t— 1, 2, 3 are defined in (5). 

Proof. See Section 6; equation (17) follows because ef < 1 and because —27 > — 1 — 7. □ 

Note that — 1 — 7 — log e\ is the fading number of the fading channel between the transmitter and 
the receiver, whereas —1 — 7 — loge| is the fading number of the fading channel between the relay 
and the receiver (12). Thus, denoting the fading number of the former channel by X2, and denoting 
the fading number of the latter channel by X3, the upper bound (16) can be further upper-bounded 

by 

X < max{x2,X3}- (18) 

The right-hand side (RHS) of (18) is the fading number of a multiple-input single-output (MISO) 
fading channel with two transmit antennas and one receive antenna, where the fading processes 
corresponding to the different transmit antennas are independent, zero-mean, circularly-symmetric, 
complex Gaussian processes of spectral distribution function Fg(-), i = 2,3 [13], sec also [11, 14]. 
Thus, the fading number of the fading relay channel is upper-bounded by the fading number of the 
MISO channel that results when the transmitter and the relay can cooperate. In the following, we 
shall refer to this channel as the TRC-MISO channel. (Here "TRC" stands for "transmitter-relay 
cooperation".) 

It follows from (18) that if the fading number of the channel between the transmitter and the 
receiver is larger than the fading number of the channel between the relay and the receiver, i.e., 

X2 > X3 

then at high SNR it is optimal to switch the relay off. 

Corollary 2. Let the fading processes {H 2 ,k, k € Z} and {H 3 ,k, k <G Z} satisfy 

4 < 4- (is) 

Then, the fading number of the above relay channel is given by 

X = -1-7 + log^- (20) 

e 2 

Using a decode- and- for ward strategy [15], the following rates are achievable: 
Theorem 3 (Lower bound). Consider the above fading relay channel. Then 

X > max j-1 - 7 + log 1, -1 - 7 + log -| - log ^1 + ^ j. (21) 

Proof. See Section 7. □ 

Denoting the fading number corresponding to the fading {Hi.k, k € Z} by x^, the lower bound 
(21) can be written as 

X > max|x2,X3 - log(l + cxp(x 3 - Xi))} = maxjx 2 ,Xi - log(l + exp(xi - X3))}- (22) 
Note that if xi > X3; then 

log(l + cxp(x3 -Xi)) <l°g2 
and the difference between the lower bound (21) and the upper bound (18) is at most one bit. 
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Corollary 4. Let the fading processes {Hi }k , k € Z} and {H z k , k e Z} satisfy 



4<4- (23) 



c 3 

Then, the fading number of the above fading relay channel is bounded by 



max< 



-1 -7 + log \, -1- 7 + log \\ > x > maxj -I-7 + log ^, -I-7 + log 4| - log 2. (24) 

e 2 e 3 J I e 2 6 3 J 

As observed above, for SNRs below 80dB, the capacity is approximately upper-bounded by 

C(SNR) < 3 + X , SNR < 80dB (25) 

so a gap of log 2 w 0.6931 nats seems substantial. Nevertheless, for slowly- varying fading channels, 
the prediction errors ef, I = 1,2,3 are small and the fading number, which depends on ej via 
— log e|, is much larger than log 2. For example, for mobile speeds of the order of 5 km/h, prediction 
errors e\ of roughly 10~ 4 seem plausible, see, e.g., [16, Sec. II]. In this case, the fading number is 
approximately 

X = -1 - 7 + log ^ ~ 7.6331 nats (26) 

6 £ 

and the RHS of (25) becomes 10.6331 nats. Thus, for slowly-varying fading channels, a gap of one 
bit (or equivalently log 2 nats) is reasonably small. 

Corollary 4 demonstrates that, when xi > X3, the decode-and- forward scheme achieves commu- 
nication rates that are within one bit of the capacity of the relay channel. This is consistent with 
the Gaussian relay channel where the decode-and-forward scheme achieves rates that are within 
one bit of the capacity, too [17, Th. 3.1]. Note that the difference between the lower bound (21) 
and the upper bound (18) decreases as (xi — X3) increases. 

Thus, if the fading between the transmitter and the relay can be predicted more accurately than 
the fading between the relay and the receiver, then the fading number of the fading relay channel is 
at most one bit smaller than the fading number of the TRC-MISO channel. If we view the fading 
number as an indication of the rates at which communication is power- inefficient, then this result 
demonstrates that the rates at which the fading relay channel and the TRC-MISO channel operate 
in the power-inefficient regime are within one bit. Note however that this does not imply that for 
both channels the power-inefficient regime starts at the same SNR. Indeed, in the following section 
we derive nonasymptotic upper and lower bounds on the capacity of the fading relay channel as 
well as on the capacity of the TRC-MISO channel. These bounds suggest that the capacity of the 
fading relay channel increases much more slowly with the SNR than the capacity of the TRC-MISO 
channel. 



5 Nonasymptotic Bounds 

To simplify the analysis, we assume throughout this section that the channel between the transmitter 
and the receiver is memoryless, i.e., 

*2(A) = 1> -\^ X ^\ 

which yields — For this case, a nonasymptotic upper bound can be derived by extending [11, 
Eq. (16)] (see also [18, Th. 4.2]) to the above fading relay channel, i.e., 

C(SNR) < C IID (SNR) + log(l + ~ to&iHW + dA, SNR > (27) 

where Cud (SNR) denotes the capacity in the memoryless fading case. It can be upper-bounded by 
(cf. [3, Eq. (141)]) 

C„ D (SNR)< inf J -1 + a log I +loglYa,^) +log5 

a,/3<0, VP/ 



<5>0 



- (1 - a) e<E 1M ) + SNR < 1+ /> + 1 + i <*, 



6 



where 



f-VMt, (i/>o, e>o) 



denotes the incomplete Gamma function, and where 



Ei( - I,4 -i - 



dt, x > 



denotes the exponential integral function. Note that the upper bound (27) is also an upper bound 
on the capacity of the TRC-MISO channel. 

The next proposition presents a nonasymptotic lower bound on the capacity. It is similar to a 
lower bound that was derived in [18, Prop. 4.1] for single-antenna point-to-point fading channels 
with memory. For point-to-point fading channels, this bound is tight at high SNR in the sense 
that it achieves the fading number. For the fading relay channel, it can be shown that this bound 
achieves the lower bound on the fading number given in Theorem 3. 

Proposition 5. Let the fading process {H 2 ,k, k € Z} be memoryless, i.e., let 



Then, we have 



C(SNR) > sup mm{Rtr(SNR;S,a),R rr (SNR,p;5,a,5r)}, (SNR > 0, p > 0) (29) 

0<S,a,S r <l 

where 

/ a 2(l-a) \ rl/2 / ^(l-a) \ 

i? tr (SNR; 8, a) ± log(^^) - j ^ log^(A) + dA 

( o-^-^e \ ./ aW-^e \ 



and 



i? rr (SNR, p; S, a, S r ) 

- log log 1 - l + l 0g f e " 7Ql0g ^)^ SNRag2(a " 1)+e X 
loglog^ 1 + log^ log (^)^ 2S NR 

1/2 / (j2( a— 1) 1 \ 

1/2 lQg ^ (A)+ ^VSNR^ + ^sMj dA 



/_ 



/e-Talog(^)(5 Q SNR Q CT 2 ( Q - 1 ) +e\ ./ e^a log(^)5 Q SNR Q a 2 («- 1 ) + e\ 
~ eXP [ log(^)<5 r/9 2 SNR log(^)5 rP 2 SNR J' (31) 

Proof. See Appendix A. □ 

A lower bound on the capacity of the TRC-MISO channel (but not necessarily the relay channel) 
follows by using a beam-selection strategy, where the transmitter transmits either from the first 
antenna (i.e., the transmitter) or from the second antenna (i.e., the relay). The TRC-MISO capacity 
is thus lower-bounded by [18, Prop. 4.1] 



Cmiso(SNR) > sup max{ J R 2 (SNR,p;(5),i? 3 (SNR,p;(5)}, (SNR > 0, p > 0) (32) 

0<<5<1 



where 



i?f(SNR, p; S) 4 log( JSNR( 1 1 + p2) ) - /^log(^(A) + ps J {1 + p2) ) ^ 

~ 6XP ( log(^) ( 5SNR(l+p 2 ) ) E Vlog(^)<5SNR(l + p 2 ) ) ' ' = 2 ' 3 " (33) 
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Note that beam-selection is optimal at high SNR in the sense that it achieves the fading number 
[13, 11, 14]. 

While the above lower bounds (29) and (32) are tight at high SNR, they are loose at low SNR. 
We therefore include the following lower bounds that are superior to (29) and (32) when the SNR 
is small. 

A lower bound on the capacity of the TRC-MISO channel follows by using a beam-selection 
strategy, and by lower-bounding the capacity of the resulting point-to-point channel by choosing 
quaternary phase-shift keying (QPSK) channel inputs [19, Prop. 2.1 & Eq. (17)], i.e., 

Cmiso(SNR) >max{/(X;(i? 1 + i/ 1 )X + Z r , 1 | H 1 ),l(X r ;(H 3 + H 3 )X r + Z t | H 3 )} (34) 

where Hi, I = 1,3 and Hi, £ = 1,3 are independent, zero-mean, circularly-symmetric, complex 
Gaussian random variables of variance 1 — e|((l + p 2 )A 2 ) and e|((l + p 2 )A 2 ), respectively; and X 
and X r are uniformly distributed over 

{ v^T+7)a, iya+7)A, - a /o+7)a, -iv/(IT7)A}. 

Here the prediction errors are 

e,(0 = cxp ^ |^ log(V;(A) - -^j dA^J - I £ = 1, 3. (35) 

The RHS of (34) can be computed numerically. 

The above lower bound can be extended to the fading relay channel by employing a decode- 
and- forward strategy (Proposition 6), and by choosing {Xk, k £ Z} and {X r ,k, k <G Z} to be i.i.d. 
random variables, independent of each other and with 

• X being uniformly distributed over {SA, \5A, — 6A, — i<5A} , for some < 5 < 1; 

• X r being uniformly distributed over {A, iA, —A, — iA}. 
This yields 

C(SNR) > sup min(/(X; (H[ + H[)X + Z rS I H[), 
o<s<i L 

l(X r ; (H 3 + H 3 )X r + H 2A X + Z x \ H 3 , X x ) } (36) 

where H 3 and H 3 are as above, and H[ and H[ are independent, zero-mean, circularly-symmetric, 
complex Gaussian random variables of variance 1 — e 2 (<5 2 A 2 ) and e 2 (<5 2 A 2 ), respectively. The RHS 
of (36) can be computed numerically 

We evaluate the upper bound (27) and the lower bounds (29), (32), (34), and (36) for spectral 
distribution functions of the form 

] &-<m<h '-«•« (37) 

where T £ > 0, A e > 0, and < < 1/2 satisfy 

,1/2 

/ F;(A)dA = 2T/A/+(l-2A/)A/ = l, £ = 1,3. (38) 

7-1/2 

(Recall that F^X) = 1, 1/2 < A < 1/2.) We shall consider two scenarios: in the first scenario, the 
fading between the transmitter and the relay has a prediction error of 10~ 4 , whereas the fading 
between the relay and the receiver has a prediction error of 10~ 2 . This implies that the fading 
number of the relay channel is roughly the same as the fading number of the TRC-MISO channel. 
In the second scenario, both the fading between the transmitter and the relay and the fading 
between the relay and the receiver have a prediction error of 10~ 2 . In this case, the lower bound 
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ei = 1(T 4 , el = !,£§ = HT 2 
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SNR [dB] 
e 2 = 1(T 2 , el = 1, e§ = 1CT 2 
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SNR [dB] 



Figure 2: Upper bound on the capacity of the TRC-MISO channel (27); lower bounds on the 
capacity of the TRC-MISO channel [maximum of (32) and (34)] and on the capacity of the fading 
relay channel [maximum of (29) and (36)]; the fading number of the TRC-MISO channel and the 
lower bound (21) on the fading number of the fading relay channel. The prediction errors are 
e\ = KT 4 ,e! = l,e§ = 1CT 2 (top figure) and e\ = 10" 2 , e| = 1, e§ = 1CT 2 (bottom figure). 
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on the fading number of the relay channel (21) is log 2 nats smaller than the fading number of the 
TRC-MISO channel. 

Figure 2 shows the upper bound on the capacity of the fading relay channel and of the TRC- 
MISO channel (27), the lower bounds on the capacity of the fading relay channel [maximum of 
(29) and (36)] and of the TRC-MISO channel [maximum of (32) and (34)], together with the 
corresponding fading numbers for the above two scenarios. In particular, the top figure in Figure 2 
shows the bounds (27), (29), (32), (34), and (36) for 

Ti w 5.76034 T 3 w 10.99684 

Ai = 10~ 5 and A 3 = 0.005 

Ai w 0.08679 A 3 « 0.04503 

resulting in e\ = 10~ 4 and e\ = 10~ 2 . In this case, the fading number is upper-bounded by (16) 

X < -1-7 + log^ « 3.0280 (39) 

e 3 

which is equal to the fading number of the TRC-MISO channel. The fading number of the relay 
channel is lower-bounded by (21) 

X > -1 - 7 + log \ - logf 1 + ^) « 3.0180. (40) 

e 3 V e 3 / 

The bottom figure in Figure 2 shows the bounds (27), (29), (32), (34), and (36) for 



Ti 


= T 3 


w 10.99684 




= A 3 


= 0.005 


Ai 


= A 3 


w 0.04503 



resulting in e\ = e 3 = 10 2 . As in the above example, for the relay channel this yields 

X< -1-7 + log \ w 3.0280 (41) 

which is equal to the fading number of the TRC-MISO fading channel. The lower bound (21) 
becomes 

X> -1-7 + log— -log 2 w 2.3348. (42) 

e 3 

In both cases we assume that p = 1 and a = 1. 

Note that for the above spectral distribution functions the fading processes {Hi,fc, k G Z} and 
{H 3: k, k G Z} are nonephermal in the sense that [20, Def. 2.1] 

,1/2 

/ ^' 2 (A)dA>2, 1=1,3. (43) 

J-l/2 

In this case i.i.d. inputs and QPSK as well as beam-selection achieve the low-SNR asymptotic 
capacity [20, Sees. II- A3 & II-B4]. Thus, for the above spectral distribution functions, the lower 
bound (34) is tight at low SNR. 

We observe that the lower bound for the fading relay channel (29) increases much more slowly 
with the SNR than the lower bound for the TRC-MISO channel (32), even in the first example 
where the fading numbers of both channels are almost identical. Since these lower bounds are tight 
at high SNR, we suspect that the same is also true for the capacities of both channels at high SNR. 
Thus, even though the capacities of the fading relay channel and the TRC-MISO channel have 
similar asymptotic behaviors in the limit as the SNR tends to infinity, they may differ substantially 
at finite SNR. 
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6 Proof of Upper Bound 



Theorem 1 follows from Fano's inequality [12, Th. 2.11.1] and from the following upper bound on 

C(SNR) < lim sup 

<min( lim sup -l(X?; Y r \, F") , lim - sup/fXf, X^; Y?) } (44) 

I n— >oo n ' n— >oo n x J 

where the suprema are over all joint distributions on satisfying the power constraints 

(6) and (7). Here the second step follows by upper-bounding 

YP) < I(X?; Y^Y?) and l(X?; Y?) < l(X?, X^Y?) 

which in turn follows because I (A; B) < I (A; B, C) for every random variables A, B, and C. 

The first term on the RHS of (44) is upper-bounded by the capacity of a single-input multiple- 
output (SIMO) fading channel with peak-power A 2 , whereas the second term on the RHS of (44) is 
upper-bounded by the capacity of a MISO fading channel with peak-power A 2 + A 2 , which by (8) 
is equal to A 2 (1 + p 2 ). Indeed, we can upper-bound the first term on the RHS of (44) as follows: 

h(X?;Y r y,Y?) 
n x 

<h(X--Y^Y-\Hl 1 ) 



= -XX*?;n,*,y I Hi^Y^^t 1 ) 
fc=i 

= -£/(*i fe ;y,*,y fc I #3,i = o,r*r 1 ,y 1 *- 1 ) 



n 



n 

<-^2 I ( X ^ Y r k l 1 ' Y l~ 1 ^Yk | #3*1 = 0) 
H fc=l 

1 ™ 

< - / , I\Xk,H 1 [ , i? 2 i ! Y r ,k,Yk H 3 i = 0) 

fe=i 

I n 1 n 

< -^su P /(x fc; y r . fe ,r fe | ff 3 . fe = o) + -^/(zriY.itfl 1 ;^,*,** | Xt,^ = o) 

II k=i n k=i 

< sup/^y,!, Fx | ff 3 ,i - 0) + - ^ /(iJ^Y 1 , y r , fe , F fe I X fe ,ff 3ifc = 0) (45) 



n 
fe=i 



where the first supremum is over all input distributions on X/, satisfying (6), and the second supre- 
mum is over all input distributions on X\ satisfying (6). Here the first step follows because X™ 
is independent of H 31 ; the second step follows from the chain rule for mutual information [12, 
Th. 2.5.2]; the third step follows because X^ A is a function of Y r f 1 , so (H 31 , X^) is known and 
we can therefore subtract H 3y eX ry e from Ye, I = 1, . . . , k, resulting in the same mutual information 
as if we would set i?| l = 0; the fourth step follows because I(A; B\C) < I(A, C; B) for any random 
variables A, B, and C (this is a consequence of the chain rule for mutual information and of the non- 
negativity of mutual information); the fifth step follows by adding the observations (iJ^J 1 , H^ 1 ) , 

and by noting that, conditional on (Xk,H 31 = 0, 7 1 , H\\ 1 ) , the pair (y,fc,Yfc) is independent 
of [X.\~ 1 ,iy 1 ~ 1 ,y i *); the sixth step follows from the chain rule for mutual information and by 
upper-bounding each summand in the first sum by its supremum; and the last step follows from the 
stationarity of the channel, which is implies that sup l{Xh\ Y r .k, Yk | Bz,k — 0) does not depend on 
k. 

The first term on the RHS of (45) is the capacity of a memoryless SIMO fading channel, which 
is given by [3, Cor. 4.32] 

sup/^y^y | #3,1 = 0) =loglogSNR-2 7 + o(l) (46) 
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where o(l) tends to zero as SNR — »■ oo. The second term on the RHS of (45) can be upper-bounded 

by 

^ n 1 n 

- ^ I (H*^ 1 , H^ 1 ;Y rtk ,Y k | X k ,H 3k = 0) < - l(H^ J 1 ,!!^ 1 ; H lk ,H 2tk ) 



n * — ' * 1 n 

fe=i fe=i 



i ™ r 

= ~X] / (- ff i,i 1; - ffl . fe ) + 7 (- ff 2,i 1 ;- ff 2,fe) 



(47) 



where the first step follows by adding the observations (Hi >k , -ff2,fc) and by noting that, conditioned 
on (X k ,H 3 , k = 0,Hi,k,H 2t k), the pair (Y r , k ,Y k ) is independent of {H^ 1 , H^ 1 '); and the second 
step follows because the processes {Hi tk , k € Z} and {H 2k , fc € Z} are independent. 
By Cesaro's mean [12, Th. 4.2.3], it follows that 



1 ™ 
lim — > 

n— »oo Tl ^ — ' 



^(-^l,! 1 ;-^!,*:) + ^(-^2,1 1 ! ^2,fe) 



= lim l(Ht 1 1 ;H hk )+ lim l(H^ 1 ;H 2 ,k) 

= log ^+ log 4- (48) 



Here the second step follows because {-Hi,fc, fc £ Z} and {i?2,fc, fc e Z} are unit-variance Gaussian 
processes whose conditional variances, conditioned on the past (fc — 1) fading coefficients, tend to e 2 
and e| as k tends to infinity [21, Lemmas 5.7(b) & 5.10(c)]. Combining (45)-(48), we thus obtain 

lim sup -l(X[ l ; Y™ 1; Y?) < log log SNR - 2 7 + log \ + log L + o(l). (49) 

n->oo n e l e 2 

To evaluate the second term on the RHS of (44), we first note that, by (6), (7), and (8), the 
channel inputs X k and X r ^ k satisfy 

\X k \ 2 + \X T , k \ 2 <JK 2 {1 + p 2 ), keZ (50) 

with probability one. By maximizing over all joint distributions on (X™, X^) satisfying (50) [rather 
than (6) and (7)], it follows that lim„_>. 00 sup ^/(X™, X™ x \ Y") is upper-bounded by the capacity of 
a MISO fading channel with fading processes {i?2,fe, k € Z} and {H 3 , k , k e Z} and with peak-power 
constraint A 2 (1 + p 2 ), i.e., we have [13, Cor. 8] (see also [11, Cor. 8], [14, Cor. 5.6]), 



o(l). (51) 



lim aup-l(X?,X? tl ;Y?) < loglog(SNR (l + p 2 )) - 1 - 7 + max/log \, log \\ 
n ft L ^2 ^3 J 

Combining (49) and (51) with (44), we obtain 

C(SNR) < mini log log SNR - 2 7 + log \ + log \ + o(l), 

( e l e 2 

loglog(sNR(l + p 2 )) -l- 7 + max|log-^,log-^J+o(l)|. (52) 

Computing the difference (C(SNR) — log log SNR) in the limit as the SNR tends to infinity, and 
noting that 

Hm |loglog(sNR(l + p 2 )) - log log SNR} =0 

it follows that 

X < mini -27 + log \ +log^,max|-l - 7 + log \, -1 - 7 + log \ \ \. (53) 

[ e l e 2 I e 2 e 3 J J 

This proves Theorem 1. 
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7 Proof of Lower Bound 



In the following we prove Theorem 3. The first term in (21) is the fading number of the channel 
between the transmitter and receiver [3, Cor. 4.42] and follows by switching the relay off. In the 
following, we prove the remaining terms. They are based on the following proposition. 

Proposition 6. Consider the above fading relay channel. Then the rate 

R = lim sup - min{/(X{ 1 ; F™! | X? A ),l(X?, X^; Y") } (54) 
is achievable. Here the supremum is over all product distributions on (X™ ,X™ A ), i.e., 

n 

Px?,Xn i (-) = l[Px,X r (-) 
fe=l 

satisfying the power constraints (6) and (7). 

Proof. See Appendix B. □ 

Note that (54) is an extension of the decode-and-forward (DF) scheme [15, Th. 1] to channels 
with memory. 

Theorem 3 follows from Proposition 6 upon choosing {Xk 1 k £ Z} and {X r ^, k e Z} to be i.i.d., 
circularly-symmetric random variables, independent of each other and with 

log|X fc | 2 ~W([loglogA 2 ,logA 2a ]), keZ (55) 
log|A r , fe | 2 ~w([logAf,logA 2 ]), keZ (56) 

for some < a < (3 < 1. Here U {A) denotes the uniform distribution over the set A. 

Before we set out to prove Theorem 3, we pause for intuition. Recall that if the channel between 
the transmitter and the receiver has a larger fading number than the channel between the relay and 
the receiver, then it is optimal to switch the relay off, i.e., to set X r ^ k = 0, k e Z. This happens 
if X2 > X3i an d we therefore focus on the case where \3 > X2- Since every signal sent from the 
transmitter to the relay interferes also at the receiver, there is a trade-off between achieving high 
data rates from the transmitter to the relay (requiring a large transmit-power) and minimizing the 
interference at the receiver (requiring a low transmit-power). We address this problem by choosing 
Px,x r (-) such that -^-^ vanishes as A 2 tends to infinity, i.e., by allowing the relay a much larger 
transmit-power than the transmitter. 

The input distribution (55) and (56) chooses X k and X r ^ independently and trades rates from 
the transmitter to the relay against rates from the relay to the receiver by using the parameters a 
and f3. For instance, increasing a allows for larger rates between the transmitter and the relay, but 
requires a larger (3 (since we have (3 > a), decreasing the rates achievable between the relay and 
the receiver. 

7.1 Lower Bound on lim n _, 00 ^l{X?; Y™ x \ X? A ) 
We first lower-bound the first term on the RHS of (54). We have 

l(X?;Y r y\XZ 1 )=l(X?;Y r y) 

= J2 I ( X ^ Y r n i\X k 1 - 1 ) 
k=l 

n 

> ]T l{X k -Y r y \xt') (57) 

k=K+l 

for every < n < n. Here the first step follows because X™ and X^: x are independent, and because 
X™ and X™\ are also independent when conditioned on Y™ t ; the second step follows from the 
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chain rule for mutual information; and the third step follows from the nonnegativity of mutual 
information. Using that {X k , fc e 2} is i.i.d. and that reducing observations does not increase 
mutual information, we obtain 

l(X k ;Y r n A | Xt 1 ) > l(X k ;Y r k A | Xt 1 ) 
= l(X k -Y r %X k 1 - 1 ) 
>l(X k] Y r k k _ K ,X k k zl) 

= l(X k ; Y k k _ K ,H k ^_ K , X k ll) £l (SNR, «) (58) 

where E\ is defined as 

ei(SNR, k) 4 7(X fe ; y r fc fc _ K , A^) - 7(X fe ; X^) 

= I { X ^ B \~k-K I ^k-K'^fc-re)" 

Note that, due to the stationarity of the channel and of the proposed coding scheme, £i(SNR, k) 
does not depend on k. Furthermore, it follows from [3, App. IX] that for every fixed k 



lim £i(SNR,k) = 0. 

SNR^oc 



(59) 



We further lower-bound the RHS of (58) by 

l(X k ;Y rik _ K ,H 1 ^ K ,X k zj i ) > l{X k ;Y r ^ k ,H 1 ~ k 1 _ K ) 

= l(X k ;Y rM | H~f}_ K 



(60) 



which follows because reducing observations does not increase mutual information, and because 
I (A; B, C) > I (A; B\C) for any random variables A, B, and C. 
We express the fading coefficient H lyk at time k as 

#l,fc = #l,fc + #l,fc 

where H\ tk = E[-ffi,fc | H k "^J is the best predictor of H\ tk given Hi tk -i, . . . , Hi tk _ K , and where 
Hi. k denotes the prediction error. Note that, since {-Hi,fc k € Z} is a zero-mean, complex Gaussian 
process, it follows that also -ffi.fc and Hi tk are zero-mean, complex Gaussian random variables with 
variance l — e\ K and e\ K , respectively. Further note that Hi, k is independent of H.\~^_ K [21, Lemma 
5.8], and that [21, Lemmas 5.7(b) & 5.10(c)], [8] 



lim e? „ 



exp 



1/2 



1/2 



logi^(A)dA . 



(61) 



With this, we obtain 



H 



fe-i ■> 

l,k-Kj 



i{x k ] y r ,k 

= h{^{H\ tk + H\ tk )X k + Z r>k 



H\ tk ^ — h{^H\ ik + H\ ik )X k + Z r , k Hi y k,Xkj 



> h(H\ tk X k + Z r>k | H\ tk ,Z r>k ) - h(\[Hi. k + Hi tk )X k + Z r , k Hi^ k ,X k ^j 
= h(Hi tk X k | Hi ik ) — h(H\ tk X k + Z Tik | X k ) 

Z r ,k 

X k 



= £[\og\H 1 . k \ 2 ]+h(X k )-£[\og\X k \ 2 ]-h( K H l . k + 

= E[log|ff 1 , fe | 2 ] +/ l (X fe )-E[log|X fe | 2 ] - logTT-1- 

> EflogliJLfel 2 ] +^(X fe )-E[log|X fe | 2 ] -log7r-l-logfe 2 K + -^ 

V log A 



X k 

iog( 4, K 



(62) 



where the second step follows because conditioning does not increase entropy; the third step follows 
from the behavior of differential entropy under translation [12, Th. 9.6.3]; the fourth step follows 
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from the behavior of differential entropy under scaling by a complex number [12, Th. 9.6.4]; the 
fifth step follows because, conditioned on X k , the random variable H\^ k + Z r ^ k /X k is Gaussian; and 
the last step follows because, for our choice of input distribution (55), we have \X k \ 2 > log A 2 with 
probability one. 

The first term on the RHS of (62) can be evaluated as [22, Sec. 4.331] 

E[\og\H hk \ 2 ] = - 7 . (63) 
The subsequent three terms yield [3, Lemmas 6.15 & 6.16] 

h(X k ) - E[log\X k \ 2 ] -log7r = /i(log|X fe | 2 ) =log(alogA 2 -loglogA 2 ). (64) 
Combining (58)-(64), and noting that the RHS of (62) does not depend on k, yields 

\l{X?;Y r y | X^) 



n — k 
~ n 

which tends to 



log(alogA 2 -loglogA 2 ) -l- 7 -log^e 2 K + ^-^^ - £i (SNR,k) 



(65) 



log( a logA 2 -loglogA 2 ) -l- 7 -log^ 2 K + ^-^^ - £i (SNR,k) 



as n tends to infinity. Using (59), and noting that 



lim log (a log A 2 — log log A 2 ) — log log SNR = log a 

SNR-s-oo 



a 2 



lim log e lK + = log e hK 

SNR— >oo y log A 



we obtain 



SNR— > 

which tends to 



lim ( lim -ifXfrY?! I X^) - log log SNR 

fR^co (^n-S-oo n ' 



> -1 - 7 + log + log a (66) 



SNR— > 

as K tends to infinity. 



lim f lim -l(X?; YJ\ \ X?j) - log log SNR 1 > -1 - 7 + log \ + log a 



(67) 



7.2 Lower Bound on lim^ X^; Y?) 

We continue by lower-bounding the second term on the RHS of (54). The proof is similar to the 
proof of (67), and we will therefore skip some of the details. We start with the chain rule for mutual 
information to obtain 



l{X?,X^Y?)=Y i l{X k ,X rM Y? | Xt\x^) 

k=l 

n 

= Y J l{X k ,X rM Y^Xt\X k r - l 1 ) 

fe=i 

n 

> l{X k ,X rX ,Y k k _ K ,X k k zlX^_ K ) (68) 



k=K+l 



for every < n < n. Here the second step follows because {(X k ,X r , k , k e Z)} is i.i.d. We next 
define e 2 (SNR, n) as 

e 2 (SNR, k) = l(X k , X rM ; Y k _ K ,X k z]., X r ^- K , # 3 ,fc-J - I ( x k, X r,k] Y k _ K , X k-l> X r*- K ) 

= l{X k ,X 7 , k ;H k Z 1 _ K | Y k _ K ,X k k zlX k - klK ). (69) 
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We show in Appendix C that for every fixed k 



lim £ 2 (SNR,k) = 0. 

SNR->cx> 



(70) 



With this definition, every summand on the RHS of (68) can be lower-bounded by 

l(X k ,X r , k ; Y k k _ K ,X k k zlX k r ; k \) = l(X k ,X r , k ; Y k _ K , X k k zl X^_ K , H*~ k l K ) £ 2 (SNR, «) 

= l(X k ,X r , k ; Y k k _ K ,X k -_lX k ; k \ | H^) e 2 (SNR, «) 
> l(X k ,X r . k ;Y k | H^ k )-e 2 (SNR,k) 



(71) 



where the second step follows because {H 3k , fc <G Z} is independent of (X k ,X rtk ); and the third 
step follows because reducing observations does not increase mutual information. 
As above, we express the fading H 3 , k as 



H 



3.k 



H 



3.fc 



H, 



r,k 



where H 3k — E[H 3 ^ k \ H k k Z K ] and where H 3ik is a zero-mean, complex Gaussian random variable 
of variance e§ K satisfying 



lim e 2 



3,K 



exp 



/ 1/2 log^(A)dA). 



(72) 



We thus have 
l{X k , X rik ; Y k 



H 3,k- K ) - h ( Y k | H 3,k- K ) ~ h ( Yk I x k,X r ,k,H 3 k - K ) 

> h(Y k | H k k Z K , -ff3,fe, i?2,fc, Afe, Z k ) — h(Y k | X k ,X r ^ k ,H 3>k ) 
= h(H 3 ^ k X r ^ k | i?3,fe) - h(H 3 k X r k + H 2 . k X k + Z k | A fe , A r;fc ) 



(73) 



where the second step follows because conditioning does not increase entropy and because H 3tk 



is a function of H k k Z K ', and the last step follows from the property of differential entropy under 
translation and because, conditioned on H 3k , the random variable H 3k X rk is independent of 
{H^- K ,H2, k ,X k ,Z k ). 

As above, we further lower-bound the RHS of (73) as 

i(x k ,x r x,Y k I ff^ij 

> h(H 3ik X rik I H 3 , k ) - h(H 3 , k X r , k + H 2 , k X k + Z k | X k , X r , k ) 
= E[log|H 3 ,fc| 2 ] +h(X r . k ) - E[log|A r , fc | 2 



X k Z k 
h\ H 3 , k + H\ k — h 



log(logA 2 -/31ogA 2 



\X k f 



X r ,k 
„2 



X 



r.k 



x, 



+ 



r,fc | 



. 2a 



> log log A 2 + log(l - /?) - 7 - 1 - log e 2 jK + — + — 

V /\. r /K r 

= loglog(p 2 A 2 ) + log(l - 0) - 7 - 1 - log(e 2 , K + p -^A- 2 ^ + p- 2 ^ 



2 



(74) 



where the second step follows from the behavior of differential entropy under scaling by a complex 
number; the third step follows by evaluating E[log |-ff3.fc| 2 ] + h(X r , k ) — E[log |A rj fe| 2 ] and because, 
conditioned on (X k , X r , k ), the random variable H 3 , k + Hi tk X k / X r<k + Z k j X r>k is complex Gaussian; 
the fourth step follows because, for our choice of input distribution (55) and (56), we have \X k \ 2 < 
A 2a and |A r ^| 2 > A 2/3 with probability one; and the last step follows from (8). 

Combining (68)-(74), and noting that the RHS of (74) does not depend on k, we obtain 



i(Ar,A™ i; r™)>^ 



loglog(p 2 A 2 ) +l g(l-/3)-7-l 
-log(et + p- 2 'A- 2 ^+p- 



2/3 xi) -£ 2 (SNR, K ) 



(75) 
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which tends to 



loglog(p 2 A 2 ) + log(l - P) - 7 - 1 - \og(el K + p-W A -W-°) + p ~W jL?j £2 ( S NR, «) 

as n tends to infinity. Using (70), and noting that for every < a < (3 < 1 

lim log log (p 2 A 2 ) - log log SNR = 

SNR-s-oo v ' 



SNR-> 

we obtain 



SNR— » 

which tends to 



lim (lim -(X?,X?i\Y?) -loglogSNRl > -1 - 7 + log ^- + log(l - /?) 

fR->oo [_n->oo n J 63 



(76) 



lim ( lim - (X?, X^; Y?) - loglogSNRl > -1 - 7 + log \ + log(l - /?) (77) 

SNR— >oo [^n— >oo n J 63 

as we let k tend to infinity. 

7.3 Maximizing Over a and f3 

It follows from (54), (67) and (77) that with a DF strategy, we can achieve the fading number 
X > mini -1 -7 + log^ +loga,-l - 7 + log \ + log(l -/?)} 

I e l e 3 J 



(78) 



for every < a < f3 < 1. Wc prove Theorem 3 by maximizing over a and (3. To this end, we 
first note that the optimal choice of a and j3 (in the sense that it achieves the supremum over 
< a < (3 < 1) must satisfy a = (3, since otherwise we could either increase a or decrease (3 to 
obtain a higher achievable rate. We further note that the optimal a ~ (3 must satisfy 

-1 - 7 + log 4 + log a = -1 - 7 + log \ + log(l - a). (79) 

e l e 3 

Indeed, if we had 

-1 - 7 + log ^ +loga < -1 - 7 + log ^ +log(l - a) (80) 

e l e 3 

then it would follow that 

mini -1 - 7 + log \ + log a, -1 - 7 + log \ + log(l - a) 1 = -1 - 7 + log 3 + log a (81) 

I e l e 3 J e l 

which could be increased by increasing a. In contrast, if we had 

-1 - 7 + log \ + log a > -1 - 7 + log ^ + log(l - a) (82) 

e l e 3 

then it would follow that 

mini -1 - 7 + log \ + log a, -1 - 7 + log ^ + log(l - a) \ = -1 - 7 + log \ + log(l - a) (83) 
I e i e 3 J e i 

which could be increased by decreasing a. Solving (79) yields 

a=-2$3. (84) 

e 1 T Co 



Combining (84) with (78) proves Theorem 3. 
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8 Quantize Map and Forward 



Recently, a strategy called quantize-map- and- forward was introduced by Avestimehr et al. [17]. 
They showed that the quantize-map-and-forward scheme achieves rates that are within a constant 
gap of the max-flow min-cut upper bound, where the gap depends on the number of relays but not 
on the channel parameters. For example, for the Gaussian relay channel with a single relay, and 
for the two-relay Gaussian diamond network, the gap is not more than one bit. 

However, for the Gaussian relay channel with a single relay, rates that are within one bit of the 
max-flow min-cut upper bound can also be achieved by a decode-and-forward scheme [17, Th. 3.1]. 
We therefore believe that for the above fading relay channel, the quantize-map-and-forward scheme 
will give rise to communication rates that are comparable to the ones presented in Theorem 3. 
(For fading relay channels with more than one relay, the quantize-map-and-forward scheme may be 
superior to the decode-and-forward scheme.) 

Indeed, if the link between the transmitter and the relay supports higher rates than the link 
between the relay and the receiver, then the decode-and-forward scheme achieves rates that are 
within one bit of the capacity (Corollary 4). If the link between the transmitter and the relay 
supports smaller rates than the link between the relay and the receiver, then the gap between the 
upper bound (16) and the lower bound (21) can be larger than one bit. 

9 Conclusion 

We studied the high-SNR asymptotic capacity of fading relay channels. We considered a noncoher- 
ent model where all terminals are aware of the statistics of the fading, but not of their realizations. 
We demonstrated that, if the link between the transmitter and the receiver supports higher com- 
munication rates than the link between the relay and the receiver, then at high SNR it is optimal 
to turn the relay off. We further demonstrated that if the link between the transmitter and the 
relay supports higher communication rates than the link between the relay and the receiver, then a 
decode-and-forward strategy achieves communication rates that are within one bit of the high-SNR 
capacity of the multiple-input single-output fading channel that results when the transmitter and 
the relay can cooperate. 



A Proof of Proposition 5 

To prove Proposition 5, we evaluate (54) for {Xk, k e Z} and {X r> k, k € Z} being i.i.d., circularly- 
symmetric random variables, independent of each other and with 

log|X fc | 2 ~W([alog(<5 2 A 2 ),alogA 2 ]), fceZ (85) 
log|X r , fc | 2 ~W([log(,5 2 A 2 ),logA 2 ]), fceZ (86) 

where < a, S, 5 r < 1. We shall first evaluate the first term on the RHS of (54), namely, 

lim -ifXfiY?! I X^) = lim -l(X^;Y^). (87) 

n— >oo n ' n— too n 

To lower-bound (87), we use [18, Prop. 4.1] with A 2 replaced by A 2 ", and with a replace by S a . 
We thus obtain 



ta. i/(*r ; y»,) > iog(^) - r io 6 (f; ( a) + ^) cu 



CX P ; , i w„ , 2 „ Ei 



a\og{^)S"A 2a J ^ a\og(^)5^A 2a J 

= log v FsNir ) - 7_ 1/a log (^ (A) + ^sNir ) dA 

/ a 2(l-a) \ / CT 2(l-a) \ 

-exp rT ^ Ei rT ^ , SNR>0. (88) 

P Ulog <5«SNR Q J { cdog «5«SNR Q I' V ; 
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We next lower-bound the second term on the RHS of (54), namely, lim^oo i l(X™, X™y, F") . We 
use the chain rule for mutual information and a Cesaro-type theorem [12, Th. 4.2.3] to lower-bound 

lim -I(X^X^;YP) 



1 

= lim - J2l(X k ,X rtk ;Y? \ X^^X^ 1 ) 

fc=l 

> lim iV/^,!^;^ I xt\x^) 



fc=l 



> lim l(X k ,X r , k ;Yf \ X*~\ X^ 1 ) 

k^-oc 



> lim inf J I X k ,X r . k \ Y k , 

k^t-oc \ 



X r , 



fe-1 



fe-1 



fc-1 v fc-l 



A r,l 



(89) 



where lim denotes the limit inferior, and where the infimum in the last step is over all (x 1 1 , x k r 1 1 ) 
satisfying 



5 2a A 2a < \x e \ 2 < A 2a and 5 2 A 2 < \x r j\ 2 < A 2 



1,2,...,*;- 1. 



Here the second step follows because reducing observations does not increase mutual information. 

We next show that the RHS of (89) is minimized when \xt\ 2 — A 2a and \x r j\ 2 = 8 2 A 2 for 
I = 1,2, ...,k - 1. Indeed, conditioned on [X^' 1 , X^ 1 ) = (a^ -1 , a^" 1 ) , the random variables 
Yljx r £ are given by 



xt 



Zt 



-i- = fla,/ + ^2,£— + — = H 3tt + — + 



l,2,...,fc-l 



(90) 



where {Hfc, fc € Z} is a sequence of i.i.d., zero-mean, unit-variance, complex Gaussian random 
variables, and where A = B indicates that A and B have the same law. Here we use in the 
second step that {H 2 ,k, fc <G Z} and {Z k , k e Z} are both sequences of i.i.d. Gaussian random 
variables. The second term on the RHS of (90) can be viewed as an additive-noise term. Thus, 
by choosing \xt\ 2 — A 2a and \x r .t\ 2 = 5 2 A 2 , the variance of the additive noise is maximized. By 
contradiction, it is easy to show that maximizing the variance of the additive noise minimizes the 
mutual information. Indeed, suppose that the noise that minimizes the mutual information is not 
the one with maximum variance. Then, we can add i.i.d. zero-mean Gaussian noise {Uk, fc G Z} 
to Yfjx r x such that Yt/x r> t + Ut has the same distribution as Yt/x Ti t when \xg\f = A 2a and 
|2V,£| 2 = S 2 A 2 . Since adding noise cannot increase the mutual information, it follows that the 



choice \x e \ 2 = A Q and \x r j,\ 

We thus obtain for the RHS of (89) 



S 2 A 2 minimizes the mutual information. 



lim inf/ X k ,X ryk ; Y k , 

k— >oo \ 



fe-1 



Xr,i J t=l 



XI 



fe-i 



fe-1 v-k-l 
l l i A r,l 



= lim 71 X k , X r>k ; Y k , <J H 3it 



l 



lim / X k ,X r . k ;Y k 

k— >oo V 



H. 



5 2 p 2 A 2{1 - a) 
1 



+ 



3,e - 



S 2 p 2 A< 



„fc-i 



+ 



fe-i N 



=i / 

fe-i s 



.*2p2 A 2(l-a) S 2 p 2 A 2 

where we have used that A r = pA (8). Here the first step follows because the joint law of 

1 



(91) 



X k ,X rik ,Y k , < H 3 j + 



S 2 p 2 A 



2(l-c) 



+ 



8 2 p 2 A' 



fe-i N 



does not depend on (x\ 1 ,x* 1 1 ); and the last step follows because the pair (X k ,X r>k ) is indepen- 
dent of ({H 3ik , k e Z}, {W k , k e Z}). 
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Wc continue by expressing the mutual information as the difference of two differential entropies, 
i.e., we have 



= h(v k 

-h\Y k 



H 3 .t + 



+ 



5 r VA 2(1 - tt) ' & 2 P 2 A 2 



1 



Xk,X r ^, s H 3 j 



+ 



1 



W e 



We 

k-V 

1=1, 
a 2 



k-V 



5 2 p 2 A 2 ^ ' 6?fPA 2 



k-V 



(92) 



For the second differential entropy, we have 



hi Yi 



1 



+ 



= E[log|X r , fe | 2 ] 

+ h ^H 3 ,k + H 2 ,k 
= \og(S r p 2 A 2 ) + logTT + 1 + log(e§ ifc (0 + £) 



W e 



Xi 



5 2 p 2 A 2 ^ S 2 p 2 A 2 

Xk, X r k, H 3 ,t + 



k-V 



k, | J 



f 



5?p 2 A 



2j2(l-«) 



(52p2 A 2 



k-v 



(93) 



where 



^ 2 p 2 A 2(1 " a) S 2 p 2 A 2 

and where e 2 fc (^) denotes the mean-square error in predicting H 3tk from (H 3i k-i + 
CVK fe _i), . . . , (H 3A + fWi). Note that [5, Sec. Ill] 



hm e\ k (0 = exp ^ |^ log(^ (A) + § dA^J - £. 



(94) 



The last step in (93) follows by evaluating E[log|X r ^| 2 ] and by noting that, conditioned on 



Xk , X; 



r 



#3.£ + 



5 2 p 2 A 2 ^- a ^ S 2 p 2 A 2 



w e 



k-V 



?=1 



the random variable + Hi^ k X k l X r ^ k + Z k /X r>k is complex Gaussian with variance e 2 fe (£) + 
For the first differential entropy on the RHS of (92), we have 



h Y k 



H 



+ 



a 



Wt 



k-V 



S 2 p 2 A 2 ^ «52p2 A 2 
> h(H 3i kX r! k + H 2 .kXk + Z k | H 3t k) 

l og f e log\H 3]k \ 2 + h(X rtk ) +e h(H 2 , k X k ) +7recr 2 



> E 



(95) 



where the first step follows because conditioning does not increase entropy and because, conditional 
on H 3k , the channel output Y k is independent of 

fe-i 



H 3 ,e + 



+ 



J 2 r p 2 A 2 ^ ' <5 2 p 2 A 2 , 

and the second step follows from the Entropy Power Inequality [12, Th. 16.6.3] and from the property 
of differential entropy under scaling by a complex number. For the distribution (86) on X r , k , we 
have 



h(X r . k ) = ^(log|X r , fe | 2 ) +E[log|X r , fe | 2 ] +l0 g 7T 
= log (log f -^J(5 r A 2 7r) 



(96) 
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where we use in the first step that X r j. is circularly-symmetric [3, Lemmas 6.15 & 6.16]. Similarly, 
the entropy h(H2,kXk) is lower-bounded by 



= E[log|ff 2 , fc | 2 ] +h(X k ) 

= log(e-Talog^V Q A 2a 7r 



(97) 



where we use that E [log |i?2.fe| 2 ] = ~ 7 and h(Xk) = h(log \Xk\ 2 ) + E [log |^/c| 2 ] + log7r [3, Lemmas 
6.15 & 6.16]. Combining (96) and (97) with (95) yields 



h\Y k 
> E 



H, 



+ 



S 2 p 2 A 2(l-a) S 2 p 2 J ^ ^ 



log | \H 3 M \og[ p )^A 2 7r + e-Talog( ^ ) S a A Za ir + irea 



5 2 



= log 7T + log log + \og(5 rP 2 A 2 ) + E 



log \H 3 , k \ 2 + 



= logTr + log log — + log((5 r p 2 A 2 ) + log 

o r 

/ e-^a\og(^)5 a A 2a + ea 2 ^ 



- exp 



log(^)<5 r/ 9 2 A 2 



Ei 



-iq\og{±)8 a A 2a +. 
log(^)<S r p 2 A 2 

'e-ia\og(jz)5 a A 2a +e<7 2 \ 
k log(^Kp 2 A 2 ) 

yq\og(^)S a A 2a + ea 2 \ 
log(£)<5 r p 2 A 2 



(98) 



where the last step follows by noting that |i?3,fc | 2 has an exponential distribution for which the 
expectation can be computed using [22, Sec. 4.337]. Combining (98) and (93) yields 



I Xk, X r y, Yk 



{ 



k-r 



> 



^VA 2(1 - a) ^VAV"7, =ly l 

1 f e-~<alog(-k)S a A 2a + ea 2 \ . , /x 



exp 



e-Talog(^)(5 a A 2a + ea 2 \ ( e^a \og(±)5 a A 2a + ea 



v log(^)(5 r p 2 A 2 , v 

where £ is given in (93). By (94), the RHS of (99) tends to 

1 ( e-<a\og{-k)8 a A 2a + ea 2 ^ 

loglog-.-1 + log - ^> 



log(^)<5 r p 2 A 2 



(99) 



51 

.1/2 



\og(±)8 rP iA 2 



-LM n 



(A) + 



1 



+ 



a 



-1/2 

— exp 



5 2 p 2 A 2 ^ 6 2 p 2 A* 



dA 



, alog(^)(5 a A za + e^ ^ ( alog^S A 2a + ea 2 



log(^)(5 r p2 A 2 



log(£)£ r p 2 A 2 



(100) 



as fc tends to infinity. It thus follows from (89)-(100) that 
lim -l[X?,X?i\Y?) 

n— >oo n ' 

1 (e^a\og{^)S a A 2a + ea 2 \ 

> log log — - 1 + log I - ' 



-fSM F > 



(A) + 



log(^)5 r p 2 A 2 
1 a- 



5 2 p 2 A 2 ^~ a) S^piA- 



dA 
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e-ia\og(jz)5 a A 2a +e<j 2 
log(^)<5 r p 2 A 2 

(J 2(a-l) +e \ 



log(^)<5 Q SNR Q a 2 («- 1 ) 



log(^)<S rP 2 SNR 



+ e 



(101) 



for SNR > 0. Combining (88), (101), and (54), and maximizing over < 5,a,S r < 1, proves 
Proposition 5. 

B Proof of Proposition 6 

The proof of Proposition 6 is an extension of the DF strategy analyzed in [15] to channels with 
memory. As in the memoryless case, it uses a technique called block-Markov superposition encoding. 
Most steps of the proof for memoryless channels [15] can be easily extended to channels with memory 
by defining the set of typical sequences via entropy rates rather than via entropies, cf. (103). The 
main difference is that for memoryless channels, the events (114) and (115) are independent of each 
other, whereas for channels with memory, these events are dependent. Consequently, we obtain 
a third term on the RHS of (116), for which we need to show that its exponent equals to zero, 
cf. (119). Below we give a detailed proof (cf. [23, Ch. 9]). 

Codebook construction: Encoding is performed in B + 1 blocks of n symbols. For each 
block, we generate a separate codebook. That is, we fix some distribution Px,x r () and some 
rate R. Then, for every block b, b = 1,...,B + 1 the codebook of the relay is constructed by 
drawing e nR codewords x^ A {v; b), v = 1, . . . , e nR i.i.d. according to the distribution Px r (-)- As for 
the codebook of the transmitter, for every v = 1, . . . , e nR we generate e nR codewords £™(u>, v; b), 
w = 1, . . . , e nR independently according to the conditional distribution Px\x r (), i- c -, we- draw each 
symbol Xk{w,v;b) according to Px\x T (' \ x r,k{v;b)). 

In the proof, we assume that Px,x r { ) is such that the random variables (X™,^™]^) have a 
probability density function, which implies that also (Xf, A^, Y™ l7 F") have a probability density 
function. (We shall denote the probability density function of the random variable A by /a(-)-) 
The case where Px,x r (~) does not allow for a probability density function fx™,x n 1 ,Y n 1 ,Y< l (') can be 
treated by partitioning the sample spaces of the channel inputs and outputs into a finite collection 
of mutually exclusive events, and by studying the resulting discrete problem following the steps 
below. (To this end, we need to replace the differential entropy rates in the definition of jointly 
typical sequences (103) with entropy rates.) The result follows then by taking the supremum over 
all partitions, cf. [24, Sec. 2.5]. 

Transmitter: The message m to be transmitted is divided into B equally-sized blocks 
mi, ... , tob of e nR nats each. In block b, b = 1, . . . , B + 1 the transmitter sends out the code- 
word i"(m6, m&_i; b), where we define m = tob+i = 1. 

Relay: After the transmission of block b is completed, the relay has observed the sequence of 
outputs y™i(b). The relay tries to find an m r ^ such that 





(103) 
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where denotes the set of sequences with t £ I; \I\ denotes the cardinality of the set I: and 
h({Az,k}) denotes the entropy rate of the random processes {A t .k, k e Z}, t £ 1, i.e., 

h({A x . k }) 4 lim .±^11. 

If one or more m r ^ can be found satisfying (102), then the relay chooses one of them, calls this 
choice fn r } }1 and transmits x^ 1 (rh r y, b + 1) in the subsequent block. If no such rh r j, is found, then 
the relay sets m r ^ = 1 and transmits 6+1) in the subsequent block. 

Receiver: After block b, b = 2, . . . , B + 1 the receiver has observed the outputs y™(6 — 1) and 
Vi{b). It tries to find an rrib-i such that 

(ar?(m 6 _i,m 6 _2;6- l),^^^- 1), !/?(&- 1)) G A(A™, A™ 1; Y"™) (104) 

(^iCmi-i ;&),»?(&)) G A t (X™ x ,Y?) (105) 

where m{,_ 2 is the receiver's estimate of mi_ 2 - If one or more such mn are found, then the receiver 
chooses one of them and calls this choice rhb-i- If no such mn is found, then the receiver sets 
m b _i = 1. 

Analysis: For each block b, b = 1, . . . , B + 1, let denote the event that the relay cannot 
find an m r ^ that satisfies (102), and let denote the event that the relay chooses an m r ^ ^ m& 
satisfying (102). Similarly, let denote the event that the receiver cannot find an rrib-i that 
satisfies (104) and (105), and let denote the event that the receiver chooses an mn ^ 
satisfying (104) and (105). Finally, let ,^b-i be the event that no errors have been made up to 
block b. The probability of error is upper-bounded by 

Pr(crror) <Pv(\J (f™ U U ]j (f™ U $ 

\6=1 6=2 



= Pr( ( ? I (0 1 ) u + E Pr ( «°6 u <t } ) u (4 0) u n ^6- 

6=2 ^ 

+ Pr(4%u4+ ) i n^ B ). (106) 

It follows that we can analyze every block separately by assuming that no errors were made in 
the previous blocks. The overall probability of error is then upper-bounded by (B + 1) times the 
maximum error probability of each block. 

Suppose that no errors occurred up to block b. Then, it follows from the union bound that the 
error probability in block b is upper-bounded by 

Pr 



<?u<+))u(^ 0) u^)n^. 

< Pr(^ r (0 6 ) n + Pr(4^ n &b-i) + Pr(V h (0) H & h -i) + Pr(4 +) H &b-i) • (107) 

In order to upper-bound (107), we first note that, for a given (m&,ra6-i), the process 
{(Afe, X r< k), k e Z} is i.i.d. and jointly independent of the stationary and ergodic, complex Gaussian 
fading processes {-H^.fc, k G Z}, I = 1,2,3 and of the i.i.d. Gaussian noise processes {Z r ^, k € Z} 
and {Zfc, fc e Z}, which implies that the process {(Xk, A" r ,fc, Y rt k, Ife), fc € Z} is jointly stationary 
and ergodic. It thus follows from the Shannon-McMillan-Breiman theorem [25, Th. 2] that 

lim Pr((A?(m 6 ,m 6 _i; 6), X^(m 6 _ i; 6), ^(6)) e ^(X^X^,!^)) = 1 (108) 
lim Pr((X 1 "(m 6 _ 1 ,m 6 _ 2 ;6-l),X r " 1 (m 6 _ 2 ;6-l),y 1 ri (6-l)) €^(^,^1,17)) =1 (109) 

lim Prte 1 (m 6 _i;&),y 1 n (6)) € A = 1- ( n 0) 

This implies that 

Pr (4° 6 } n ! ) and Pr (> b (0) n J? 6 _ i ) 
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both tend to zero as n tends to infinity. 

We continue with the error event (l r j fl^i-i) . This event occurs if the relay finds an rh r j } ^ mb 
such that 

(sr(m T . i6 ,m 6 _ 1 ;6),< il (m 6 _i;6),i/? il (6)) G A(^i n ,^i,^i). 

Since we have m& 7^ mb, it follows that (X"(m r ^, m r ^-\ ;b),X^ 1 (m b - 1 ;b),Y^ 1 (b)) is distributed 
according to (■)Pxp\x n 1 (■)PY n 1 \x n 1 (■)• Extending [12, Th. 14.2.3] to channels with memory 3 
yields that, for every m r .b 7^ nib, we have 

Pr 



< exp^-n^ lirn^ ^/(Xf;^™ | - 6e^ . (Ill) 



It thus follows from the union bound that 



< expf n( i? - lim -l(X?;Y;\ I + 6e ) ) . (112) 

\ y n^oo n ' 1 / / 

This implies that, if 

R < lim -l(Xl l -Y r r \ I - 6e (113) 

then the probability Pr^f^ n ^ b -i*J vanishes as n tends to infinity. 

We finally consider the error event n =^,_i). This event occurs if the receiver finds an 

rhb-i 7^ m,b-i that satisfies 

{xUmb-umb-^b - 1), < 1 (m h _ 2 ; b - 1), yj>(6 - 1)) G Aft", X^i,*?) (114) 

(^(mi-i; &),tf(&)) G Aft^,^"). (115) 

Since we have rhb-i 7^ mb-i, it follows that 

(A?(m 6 _i, m 6 _ 2 ; b - l),X^{m b -2\b - l),Y?{b - l),X^(m 6 _ i; b),Y?{b)) 
is distributed according to 

P X? A {b-l){-)Px?{b-\)\X? A {b-\){-)PY?{b-^^ 

where the argument after the random vector indicates whether the vector belongs to block b or (6—1). 
Extending [12, Ths. 14.2.1 & 14.2.3] to the above channel, we obtain for every rhb-i 7^ rrib-i 

Pr((114) & (115) are satisfied) 
< expf -nf lim -l(X?(b - 1); Y?(b - 1) I X^(b-1)) + lim -l(X^{b);Y^{f>)) - 10e) J 

V \n^oo n ' n— s-oo 71' / / 

x exp^J^i/(y 1 n (6);^ 1 (6-l),y 1 n (6-l))^. (116) 

Since the codebook construction does not depend on the block 6, and since the channel is stationary, 
it follows that 

lim -l(X?{b - 1); Y?(b - 1) I X^(b - 1)) = lim -l(X?; Y? I X^) (117) 

n—>oo n ' n— >co n ' 



3 To this end, wc need to replace the entropies in the proof of [12, Th. 14.2.3] by the corresponding differential 
entropy rates. 
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and 

lim h{Xl\{b)-Y?{b)) = lim -I{X^Y^) (118) 

n^oo 77, n— >oo n 

do not depend on b. We next show that 

lim -l(Y 1 n (b);X^ 1 (b-l),Y 1 n (b-l))=0. (119) 

To this end, we first note that, by the stationarity of (Y™(&), X" x (6 — 1), y"(6 — 1)) , we have 

7(^(6); X r ^(6- 1)) = l(Y?-X°_ n+1 ,Y° n+1 ). (120) 

This can be upper-bounded by 

l{Yi;X°_ n+1 ,Y° n+1 ) < l({H 2 ,tX l + H 3tt X r j}jf =1 ; X°_ n+1 , {H 2 jX e + H 3 jX r ^ = _ n+1 ) 

T I TJ n TJ n Y n Y n ■ IT® rrO v-0 y0 \ 

— 1 ^-"2,1) -"3,11 ^1 i A r,<i "2,-n+li "3,-n+li ^— n+li ^-r-n+l) 

J I TJTL TTU . TtO TtO \ 

— 1 I -"2,11 -"3,1' rL 2-n+n "3,-n+i; 

= I ( H 2,l'y H 2,-n+l) + I { H 3,l'y H 3,-n+l) 

< nii'L^ni j • /(//.;. : : //v. J (121) 

where the first two steps follow from the Data Processing Inequality [12, Th. 2.8.1]; the third 
step follows because the processes {X k , k £ Z} and {X,.^, k £ Z} arc i.i.d. and independent of 
{i? 2 ,fc, € Z} and {i?3,fe, G Z}; the fourth step follows because the processes {H 2 ,k, k £ Z} 
and {i?3.fe, fc € Z} are independent; and the last step follows because adding observations does not 
decrease mutual information. It follows by the chain rule for mutual information and by Cesaro's 
mean [12, Th. 4.2.3] that 



lim h^-Hl^) = lim Hlh,:ir; „ | H, 



2,1 ; 



= lim h(H 2 . k | H*?) - lim h(H 2 . k \ H^) 
= h({H 2tk }) - h({H 2 . k }) 

= (122) 

where the third step follows from the stationarity of {H 2 k , k £ Z} [12, Th. 4.2.1]. In the same way, 
it can be shown that 

lim ij(fl- 1 ;flj i _j=0. (123) 
Combining (120)-(123) proves (119). We thus obtain from (116)-(119) that 
Pr((114) & (115) arc satisfied) 

< expf -n( lim -l(X?;Y? I X r r \) + lim -ifX^Y?) - lOe^ ) . (124) 

By the union bound, it follows that 
Pr(4 +) n J^) 

< V cxp( -n( lim -l(X?;Yp I + lim -ifX^Y?) - lOe 

< expf n( R- lim — 7(X"; V" I + lim -iiX^^Y?) + lOe 
= exp(nLR- JJm ^/(X™, F") + 10e) J (125) 
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which implies that Pr^<^ + ^ n -^b-ij vanishes as n tends to infinity, provided that 

R< lim -iiXftX^YT 1 ) - lOe. (126) 
It thus follows from (126) and (113) that for every block b the rate 

i? = minJ lim -l(X™;Y r \ I lim i5 *T) i ( 12 ?) 

is achievable. Consequently, the complete message m can be transmitted over B + 1 blocks with an 
overall rate 

nR3 BR , N 

= <bTT) = b+T ( 128 ) 

By letting B tend to infinity, it follows that for every product distribution on (X^jX™^, i.e., 

n 

Px?,x- 1 (-) = \{Px,x r {-) 

k=l 

we can achieve the rate 

i? = miJ lim -iiX^-Y^ I X?A, lim -ifXf^^Y?) \. 

n->oo n n->oo n 

This proves Proposition 6. 

C The limit of £ 2 (SNR,«) 

In the following we show that £2(SNR, k) tends to zero as SNR tends to infinity. The proof follows 
along the same lines as the proof in [3, Appendix IX]. 

We first note that £2(SNR, k) > 0. Thus, it suffices to show that £2(SNR, k) < o(l). We have 

£2 (SNR,k) = l(X k ,X r . k ;H^ k l K | Y k k _ K ,X k k zlX k r ; k \) 

uf Tjk—1 I \rk vk — 1 vk — 1 \ if Tjk—1 I \rk vk -y-k \ 

— n \ n 3,k-K I J fc-K' A fc- K i A r,fc-J ~ n \ n 3,k-K I Y k-K^k-K^r,k- K ) 

h(Tjk—^ I v"k—l vk— 1 -yk— 1 \ hfljk—1 I -i^fc "t^ E7 \ 
^ n [ U 3,k-K I r fc-« ) A fe-K' A r : fc- K J -/ H- H 3,fc-re I Y k- K ^k-K^r,k-K^ n 3,k) 
u/rjk—l I v k — l yk—1 -yk— 1 \ h(nk — 1 I T^fe — 1 vk — 1 tt \ 

- n [ U 3,k- K I r fe- K ' A fe-K' A r,fc-J - n \ tL 3,k-K I ' A fc- K ' A r ,fe-«'- H 3,feJ 

= / (- ff 3,fc- K ; #3,* | ^k — K ' ^k—K ' ~^r/k — re ) ( 129 ) 

where the third step follows because conditioning cannot increase entropy; and the fourth step 
follows because, conditioned on (Y k ~^ , X k zj-, X k k Z K , H 3 ^ k ) , the fading coefficients H Z ~£_ K arc in- 
dependent of (Yfc, Xfc, X^fc). 

Expressing the mutual information l(H^Z K ',H 3 , k | Y k ~^, X k Z^i X k k Z K ) a s the difference of 
two conditional differential entropies of -f^fc yields 

£2 (SNR,«) < /i(tf 3lfc | n-"«.^-«.^-J - M#3,* | Y&,H*- k L K ,X*llxfc K ) 
= h(H 3ik | ^t-^ ^fc-J - ft(-ff3,fe I #3,fc-J 



= M #3,fc 



< ft(ff 3 ,fc | + C^)LL) - fc ( ff 3, fc | H k ; k \) (130) 

where {VFfc, fc <E Z} is a sequence of i.i.d., zero-mean, unit-variance, circularly-symmetric, complex 
Gaussian random variables, and where 

C A *tl + — = p -2P A -2(/J- a ) -2/3 

<> ,2/3 + ,2/3 ^ ^ A 2/3- 
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The second step in (130) follows because, conditioned on ~^_ k , the present fading H 3 k is inde- 
pendent of (X^zl^rk-K'^k-K)' an< ^ tne ^ ast ste P m (1^0) follows because the first differential 
entropy is maximized for \Xg\ 2 = A 2 " and |A ri £| 2 = A 2 * 3 , in which case 

\ H 2 ,t^ V H 3 j + — f 

r i fc— 1 

has the same law as {H^j + £ Wtj*, . Noting that 
ftfXfc | {H 3 ,/ + CW e } k e Zl K ) h(H 3 , k | i/ 3 fe ^ K ) 

= / i (^ 3 , fe ,{^ + c^}, fe :L) -K H *,*> H tk-«) -i({H 3 ,e + tw e }*:l_ K ,w>;:i) 

<h(H 3tk ,{H 3te + CW e } k ( :l_ K )-h(H 3 , k ,H^_ K ) (131) 
we obtain 

e 2 (SNR, K ) < /t^fc^^ + C^l'lfe-J -H H ^ H tk- K )- ( 132 ) 

The claim follows now by [3, Lemma 6.11], which states that if H G C v is a random vector of finite 
Frobenius norm and finite differential entropy, and if W € C is a Gaussian random vector that is 
independent of H, then 

lim {ft(H + cr 2 W) - /i(H)} = 0. 
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